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1 Machine learning in DNA microarrav analysis for cancer classification 
Sung-Bae Cho, Hong-Hee Won 

January 2003 Proceedings of the First Asia-Pacific bioinformatics conference on 
Bioinformatics 2003 - Volume 19 CRPITS '03 

Additional Information: full citation , abstract , references , citings , index 
terms 



Full text available: 



The development of microarray technology has supplied a large volume of data to many 
fields. In particular, it has been applied to prediction and diagnosis of cancer, so that it 
expectedly helps us to exactly predict and diagnose cancer. To precisely classify cancer we 
have to select genes related to cancer because extracted genes from microarray have many 
noises. In this paper, we attempt to explore many features and classifiers using three 
benchmark datasets to systematically evaluate the pert ... 

Keywords: KNN, MLP, SASOM, SVM, biological data mining, classification, ensemble 
classifier, feature selection, gene expression profile 

2 Cluster ensembles — a knowledge reuse framework for combining multiple partitions Q 
Alexander Strehl, Joydeep Ghosh 



March 2003 The Journal of Machine Learning Research, volume 3 

Additional Information: full citation , abstract , references , citings , index 
terms 



Full text available: « pdf(842.50 KB) 



This paper introduces the problem of combining multiple partitionings of a set of objects into 
a single consolidated clustering without accessing the features or algorithms that 
determined these partitionings. We first identify several application scenarios for the 
resultant 'knowledge reuse' framework that we call cluster ensembles. The cluster ensemble 
problem is then formalized as a combinatorial optimization problem in terms of shared 
mutual information. In addition to a direct ... 

Keywords: cluster analysis, clustering, consensus functions, ensemble, knowledge reuse, 
multi-learner systems, mutual information, partitioning, unsupervised learning 



3 Supervised adaptive resonance networks 
R. S. Baxter 

May 1991 Proceedings of the conference on Analysis of neural network applications 
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4 Industry/government track papers: Effective localized regression for damage detection Q 

in large complex mechanical structures 

Aleksandar Lazarevic, Ramdev Kanapady, Chandrika Kamath 

August 2004 Proceedings of the 2004 ACM SIGKDD international conference on 
Knowledge discovery and data mining 

Full text available: pdf(597.35 KB) Additional Information: full citation , abstract , references, index terms 

In this paper, we propose a novel data mining technique for the efficient damage detection 
within the large-scale complex mechanical structures. Every mechanical structure is defined 
by the set of finite elements that are called structure elements. Large-scale complex 
structures may have extremely large number of structure elements, and predicting the 
failure in every single element using the original set of natural frequencies as features is 
exceptionally time-consuming task. Traditional data m ... 

Keywords: clustering, damage detection, localized regression, mechanical structures, 
structure elements 



Visualizing content based relations in texts 
Edgar Weippl 

January 2001 Australian Computer Science Communications , Proceedings of the 2nd 

Australasian conference on User interface AUIC '01, Volume 23 issue 5 
Full text available: . 



_jpdf(1.47 MB) ^p 1 Additional Information: full citation , abstract , references 
Publisher Site 

Our goal is to efficiently visualize a medium sized hypertext database containing 500 - 
20000 articles. The visualization technique we propose is an Information Landscape. 
Basically, the information landscape maps texts into a 2D plane so that related texts are 
placed next to each other. The hypertexts' location is calculated according to their content 
and not according to their links. Combining already published algorithms the clustering 
works very well. An important issue, however, is a well-d ... 

Keywords: Uls/visualization for hypertext search and navigation, information visualization 
for IR, stemming/morphological analysis, text categorization, text clustering 



6 Machine learning in automated text categorization 
Fabrizio Sebastiani 

March 2002 ACM Computing Surveys (CSUR), Volume 34 issue l 

«- .. * * , Ul a . m Additional Information: full citation , abstract , references , citings , index 

Full text available: TO pdf(524.41 KB) x °- 

^ terms 

The automated categorization (or classification) of texts into predefined categories has 
witnessed a booming interest in the last 10 years, due to the increased availability of 
documents in digital form and the ensuing need to organize them. In the research 
community the dominant approach to this problem is based on machine learning techniques: 
a general inductive process automatically builds a classifier by learning, from a set of 
preclassified documents, the characteristics of the categories. ... 

Keywords: Machine learning, text categorization, text classification 
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Data mining of multidimensional remotely sensed images 
Robert F. Cromp, William ]. Campbell 

December 1993 Proceedings of the second international conference on Information and 
knowledge management 

Full text available: ^ pdf(1.39 MB) Additional Information: full citation , references , citings , index terms 



8 A sub Bayesian nearest prototype neural network with fuzzy interpretability for 
diagnosis problems 

Saman Halgamuge, Christoph Grimm, Manfred Glesner 

February 1995 Proceedings of the 1995 ACM symposium on Applied computing 

Full text available: ^ pdf(5Q8.72 KB) Additional Information: full citation , references , citings , index terms 



Keywords: Bayes classifier, fuzzy rules, neural networks, rule generation 

9 Fuzzy clustering improves convergence of the backpropagation algorithm Q 
Adel M. Abunawass, Opinderjit Singh Bhella, Min Ding, Weiqun Li 
February 1998 Proceedings of the 1998 ACM symposium on Applied Computing 

Full text available: ^ pdf(387.70 KB) Additional Information: full citation , references , index terms 



Keywords: artificial intelligence, backpropagation, fuzzy clustering, fuzzy logic, neural 
networks 



10 Contributed papers: Data-oriented methods for grapheme-to-phoneme conversion Q 
Antal van den Bosch, Walter Daelemans 

April 1993 Proceedings of the sixth conference on European chapter of the Association 

for Computational Linguistics 

Full text available: flEI pdf(898.83 KB) 

JIT Additional Information: full citation , abstract , references , citings 

W Publisher Site 

It is traditionally assumed that various sources of linguistic knowledge and their interaction 
should be formalised in order to be able to convert words into their phonemic 
representations with reasonable accuracy. We show that using supervised learning 
techniques, based on a corpus of transcribed words, the same and even better performance 
can be achieved, without explicit modeling of linguistic knowledge. In this paper we present 
two instances of this approach. A first model implements a varian ... 

11 Research track papers: Data mining in metric space: an empirical analysis of 
supervised learning performance criteria 
Rich Caruana, Alexandru Niculescu-Mizil 

August 2004 Proceedings of the 2004 ACM SIGKDD international conference on 
Knowledge discovery and data mining 

Full text available: *^ pdf(267.16 KB) Additional Information: full citation , abstract , references, index terms 

Many criteria can be used to evaluate the performance of supervised learning. Different 
criteria are appropriate in different settings, and it is not always clear which criteria to use. 
A further complication is that learning methods that perform well on one criterion may not 
perform well on other criteria. For example, SVMs and boosting are designed to optimize 
accuracy, whereas neural nets typically optimize squared error or cross entropy. We 
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conducted an empirical study using a variety of lea ... 

Keywords: ROC, cross entropy, lift, metrics, performance evaluation, precision, recall, 
supervised learning 



12 Magical thinking in data mining: lessons from ColL challenge 2000 
Charles Elkan 

August 2001 Proceedings of the seventh ACM SIGKDD international conference on 

Knowledge discovery and data mining 

■- •. . ^ , u » Jf/eAn ce „ m Additional Information: full citation , abstract , references, citings, index 

Full text available: t @pdf(602.56 KB) ^ 

terms 

ColL challenge 2000 was a supervised learning contest that attracted 43 entries. The 
authors of 29 entries later wrote explanations of their work. This paper discusses these 
reports and reaches three main conclusions. First, naive Bayesian classifiers remain 
competitive in practice: they were used by both the winning entry and the next best entry. 
Second, identifying feature interactions correctly is important for maximizing predictive 
accuracy: this was the difference between the winning classi ... 

13 Bioinformatics (BIO): Comparing approaches to predict transmembrane domains in 
protein sequences 

Paul Davidsson, Johan Hagelback, Kenny Svensson 

March 2005 Proceedings of the 2005 ACM symposium on Applied computing 

Full text available: ^ pdf(271.98 KB) Additional Information: full citation , abstract , references , index terms 

There are today several systems for predicting transmembrane domains in membrane 
protein sequences. As they are based on different classifiers as well as different pre- and 
post-processing techniques, it is very difficult to evaluate the performance of the particular 
classifier used. We have developed a system called MemMiC for predicting transmembrane 
domains in protein sequences with the possibility to choose between different approaches to 
pre- and post-processing as well as different classif ... 

Keywords: classifiers, learning, protein sequences 



14 Fuzzy RuleNet: an artificial neural network model for fuzzy classification 
Nadine Tschichold-Gurman 

April 1994 Proceedings of the 1994 ACM symposium on Applied computing 

Full text available: ^ pdf(571.46 KB) Additional Information: full citation , references , index terms 



15 Algorithmic transformations in the implementation of K- means clustering on 
reconfigurable hardware 

Mike Estlick, Miriam Leeser, James Theiler, John J. Szymanski 

February 2001 Proceedings of the 2001 ACM/SIGDA ninth international symposium on 

Field programmable gate arrays 

r- .. * ^ , u £1 ,, /occ nc „ m Additional Information: full citation , abstract , references , citings , index 

Full text available: 1 53 pdf(255.95 KB) 

terms 

In mapping the k-means algorithm to FPGA hardware, we examined algorithm level 
transforms that dramatically increased the achievable parallelism. We apply the k-means 
algorithm to multi-spectral and hyper-spectral images, which have tens to hundreds of 
channels per pixel of data. K-means is an iterative algorithm that assigns assigns to each 
pixel a label indicating which of K clusters the pixel belongs to. K-means is a common 
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solution to the segmentation of multi ... 



16 Evaluating a class of distance-mapping algorithms for data mining and clustering Q 
Jason Tsong-Li Wang, Xiong Wang, King-Ip Lin, Dennis Shasha, Bruce A. Shapiro, Kaizhong 
Zhang 

August 1999 Proceedings of the fifth ACM SIGKDD international conference on 
Knowledge discovery and data mining 

Full text available: ^ pdf(589.01 KB) Additional Information: full citation , references , citings , index terms 



17 A fast algorithm for constructing sparse Euclidean spanners U 
Gautam Das, Giri Narasimhan 

June 1994 Proceedings of the tenth annual symposium on Computational geometry 

r- ii * ^ , 0 ,, /7nc QO Additional Information: full citation , abstract , references , citings , index 

Full text available: TOpdf(705.83 KB) — a 

123 terms 

Let G=(V,E) be a n-vertex connected graph with positive edge weights. A subgraph G' is a t- 
spanner if for all u,v e V, the distance between u and v in the subgraph is at most t times 
the corresponding distance in G. We design an 0(nlog- 

18 Real world applications: Nonlinear feature extraction using a neuro genetic hybrid Q 
Yung-Keun Kwon, Byung-Ro Moon 

June 2005 Proceedings of the 2005 conference on Genetic and evolutionary 
computation GECCO '05 

Full text available: ^ pdf(361.05 KB) Additional Information: full citation , abstract , references , index terms 

Feature extraction is a process that extracts salient features from observed variables. It is 
considered a promising alternative to overcome the problems of weight and structure 
optimization in artificial neural networks. There were many nonlinear feature extraction 
methods using neural networks but they still have the same difficulties arisen from the fixed 
network topology. In this paper, we propose a novel combination of genetic algorithm and 
feedforward neural networks for nonlinear feature ... 

Keywords: feature extraction, function approximation, neuro-genetic hybrid 



19 Research track papers: Clustering time series from ARMA models with clipped data Q 
A. J. Bagnall, G. J. Janacek 

August 2004 Proceedings of the 2004 ACM SIGKDD international conference on 
Knowledge discovery and data mining 

Full text available: ^ pdf(305.69 KB) Additional Information: full citation , abstract , references , index terms 

Clustering time series is a problem that has applications in a wide variety of fields, and has 
recently attracted a large amount of research. In this paper we focus on clustering data 
derived from Autoregressive Moving Average (ARMA) models using k-means and k-medoids 
algorithms with the Euclidean distance between estimated model parameters. We justify our 
choice of clustering technique and distance metric by reproducing results obtained in related 
research. Our research aim is to assess the aff ... 

Keywords: ARMA, clustering, time series 



20 Research track posters: A generalized maximum entropy approach to bregman co- 
clustering and matrix approximation 

Arindam Banerjee, Inderjit Dhillon, Joydeep Ghosh, Srujana Merugu, Dharmendra S. Modha 
August 2004 Proceedings of the 2004 ACM SIGKDD international conference on 
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Knowledge discovery and data mining 

Full text available: ^ pdf( 166.70 KB) Additional Information: full citation , abstract , references , index terms 

Co-clustering is a powerful data mining technique with varied applications such as text 
clustering, microarray analysis and recommender systems. Recently, an information- 
theoretic co-clustering approach applicable to empirical joint probability distributions was 
proposed. In many situations, co-clustering of more general matrices is desired. In this 
paper, we present a substantially generalized co-clustering framework wherein any Bregman 
divergence can be used in the objective function, and vari ... 

Keywords: Bregman divergences, co-clustering, matrix approximation 
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21 Computing Euclidean maximum spanning trees 
C. Monma, M. Paterson, S. Suri, F. Yao 

January 1988 Proceedings of the fourth annual symposium on Computational geometry 

Additional Information: full citation , abstract , references , citings , index 
terms 



Full text available: 15S1 pdf(665.62 KB) 



An algorithm is presented for finding a maximum-weight spanning tree of a set of n points in 
the Euclidean plane, where the weight of an edge (pi, pj) equals the Euclidean distance 
between the points pi and pj. The algorithm runs in time &Ogr; (n logn) and requires &Ogr; 
(n) space. If the points are vertices of a convex polygon (given in or ... 

22 Spatial Database Clustering: Fast spatial clustering with different metrics and in the | 

presence of obstacles 
Vladimir Estivill-Castro, Ickjai Lee 

November 2001 Proceedings of the 9th ACM international symposium on Advances in 
geographic information systems 

Full text available: ^pdfd.77 MB) Additional Information: full citation , abstract , references , index terms 

In many GIS settings, the Euclidean metric is not applicable as the model for distance 
between points. Other geometric models are needed in many practical scenarios, for which 
urban geography is a common example. Recently, Estivill-Castro and Lee [8] proposed an 
effective and efficient boundary-based clustering method overcoming drawbacks of 
traditional spatial clustering, but has a geometric focus. By factoring out the topological 
aspects of the method we obtain a generic boundary-based cluster ... 

23 Polynomial-time approximation schemes for geometric min-sum median clustering 
Rafail Ostrovsky, Yuval Rabani 

March 2002 Journal of the ACM (JACM), Volume 49 issue 2 

Additional Information: full citation , abstract , references , citinqs . index 
terms 



Full text available: || pdf(257.54 KB) 



The Johnson-Lindenstrauss lemma states that n points in a high-dimensional Hilbert space 
can be embedded with small distortion of the distances into an 0(log n) dimensional space 
by applying a random linear transformation. We show that similar (though weaker) 
properties hold for certain random linear transformations over the Hamming cube. We use 
these transformations to solve NP-hard clustering problems in the cube as well as in 
geometric settings. More specifically, ... 
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Keywords: Clustering, high-dimensional data, polynomial-time approximation schemes 



24 Research track papers: A probabilistic framework for semi-supervised clustering 
Sugato Basu, Mikhail Bilenko, Raymond J. Mooney 

August 2004 Proceedings of the 2004 ACM SIGKDD international conference on 
Knowledge discovery and data mining 

Full text available: pdfd 87.51 KB) Additional Information: full citation , abstract, references , index terms 



Unsupervised clustering can be significantly improved using supervision in the form of 
pairwise constraints, i.e., pairs of instances labeled as belonging to same or different 
clusters. In recent years, a number of algorithms have been proposed for enhancing 
clustering quality by employing such supervision. Such methods use the constraints to either 
modify the objective function, or to learn the distance measure. We propose a probabilistic 
model for semi-supervised clustering based on Hidden Mar ... 

Keywords: distance metric learning, hidden Markov random fields, semi-supervised 
clustering 



25 Research sessions: clustering: Clustering objects on a spatial network 
Man Lung Yiu, Nikos Mamoulis 

June 2004 Proceedings of the 2004 ACM SIGMOD international conference on 
Management of data 

Full text available: pdf(867.67 KB) Additional Information: full citation , abstract , references 

Clustering is one of the most important analysis tasks in spatial databases. We study the 
problem of clustering objects, which lie on edges of a large weighted spatial network. The 
distance between two objects is defined by their shortest path distance over the network. 
Past algorithms are based on the Euclidean distance and cannot be applied for this setting. 
We propose variants of partitioning, density-based, and hierarchical methods. Their 
effectiveness and efficiency is evaluated for collect ... 

26 Poster papers: Clustering seasonality patterns in the presence of errors 
Mahesh Kumar, Nitin R. Patel, Jonathan Woo 

July 2002 Proceedings of the eighth ACM SIGKDD international conference on 
Knowledge discovery and data mining 

Full text available: *g| pdf(734.23 KB) Additional Information: full citation , abstract , references , index terms 

Clustering is a very well studied problem that attempts to group similar data points. Most 
traditional clustering algorithms assume that the data is provided without measurement 
error. Often, however, real world data sets have such errors and one can obtain estimates of 
these errors. We present a clustering method that incorporates information contained in 
these error estimates. We present a new distance function that is based on the distribution 
of errors in data. Using a Gaussian model for err ... 

Keywords: Gaussian distribution, clustering, distance function, forecasting, product life 
cycle, seasonality, time-series 



27 Data bubbles: quality preserving performance boosting for hierarchical clustering 
Markus M. Breunig, Hans-Peter Kriegel, Peer Kroger, Jorg Sander 

May 2001 ACM SIGMOD Record , Proceedings of the 2001 ACM SIGMOD international 

conference on Management of data, volume 30 issue 2 

r- ii * ^ i u. 0i ™ unv Additional Information: full citation , abstract , references , citings , index 
Full text available: ^S] pdf(397.09 KB) 
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In this paper, we investigate how to scale hierarchical clustering methods (such as OPTICS) 
to extremely large databases by utilizing data compression methods (such as BIRCH or 
random sampling). We propose a three step procedure: 1) compress the data into suitable 
representative objects; 2) apply the hierarchical clustering algorithm only to these objects; 
3) recover the clustering structure for the whole data set, based on the result for the 
compressed data. The key issue in this approach is ... 

Keywords: clustering, data compression, database mining, sampling 



28 Clustering in large graphs and matrices Q 
P. Drineas, Alan Frieze, Ravi Kannan, Santosh Vempala, V. Vinay 

January 1999 Proceedings of the tenth annual ACM-SIAM symposium on Discrete 
algorithms 

Full text available: ^ pdf(927.53 KB) Additional Information: full citation , references, citings , index terms 



29 Subguadratic approximation algorithms for clustering problems in high dimensional 
spaces 

Allan Borodin, Rafail Ostrovsky, Yuval Rabani 

May 1999 Proceedings of the thirty-first annual ACM symposium on Theory of 
computing 

Full text available: ^ pdf(699, 16 KB) Additional information: full citation , references , citings , index terms 



30 A local search approximation algorithm for k-means clustering Q 
Tapas Kanungo, David M. Mount, Nathan S. Netanyahu, Christine D. Piatko, Ruth Silverman, 
Angela Y. Wu 

June 2002 Proceedings of the eighteenth annual symposium on Computational 
geometry 

r- ii* ^ -i ui 0 ,xh^ oc l/ D \ Additional Information: full citation , abstract , references , citings , index 

Full text available: TO pdf(161.86 KB) ' — — 

^ terms 

In /c-means clustering we are given a set of n data points in c/-dimensional space R d and an 
integer k, and the problem is to determine a set of k points in OC; d , called centers, to 
minimize the mean squared distance from each data point to its nearest center. No exact 
polynomial-time algorithms are known for this problem. Although asymptotically efficient 
approximation algorithms exist, these algorithms are not practical due to t ... 

Keywords: approximation algorithms, clustering, computational geometry, k-means, local 
search 



31 Research sessions: data mining: Clustering by pattern similarity in large data sets 
Haixun Wang, Wei Wang, Jiong Yang, Philip S. Yu 

June 2002 Proceedings of the 2002 ACM SIGMOD international conference on 

Management of data 

r IU , i ui m nn *ad\ Additional Information: full citation , abstract , references , citings , index 

Full text available: tS3pdf(1.09 MB) 

terms 

Clustering is the process of grouping a set of objects into classes of similar objects. Although 
definitions of similarity vary from one clustering model to another, in most of these models 
the concept of similarity is based on distances, e.g., Euclidean distance or cosine distance. 
In other words, similar objects are required to have close values on at least a set of 
dimensions. In this paper, we explore a more general type of similarity. Under the pCluster 
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model we proposed, two objects ... 

32 Data clustering: a review 

A. K. Jain, M. N. Murty, P. J. Flynn 

September 1999 ACM Computing Surveys (CSUR), volume 31 issue 3 

^ ii , . i ui « ,f /e oc nA iso\ Additional Information: full citation, abstract , references , citings, index 

Full text available: ffl pdf(636.24 KB) f 5 *-' 

]£3r ^ terms, review 

Clustering is the unsupervised classification of patterns (observations, data items, or feature 
vectors) into groups (clusters). The clustering problem has been addressed in many 
contexts and by researchers in many disciplines; this reflects its broad appeal and 
usefulness as one of the steps in exploratory data analysis. However, clustering is a difficult 
problem combinatorially, and differences in assumptions and contexts in different 
communities has made the transfer of useful generic co ... 

Keywords: cluster analysis, clustering applications, exploratory data analysis, incremental 
clustering, similarity indices, unsupervised learning 



33 Applications of weighted Voronoi diagrams and randomization to variance-based k- | 
clustering: (extended abstract) 

Mary Inaba, Naoki Katoh, Hiroshi Imai 

June 1994 Proceedings of the tenth annual symposium on Computational geometry 

_ J. , ' .... « .,™ n 0 -, Additional Information: full citation , abstract , references , citings , index 

Full text available: Wj pdf(760.37 KB) 

t£SH ^ terms 

In this paper we consider thek-clustering problem for a set S of n points i=xi in thed- 
dimensional space with variance-based errors as clustering criteria, motivated from the color 
quantization problem of computing a color lookup table for frame buffer display. As the 
inter-cluster criterion to minimize, the sum on in ... 

34 Clustering for edge-cost minimization (extended abstract) | 
Leonard J. Schulman 

May 2000 Proceedings of the thirty-second annual ACM symposium on Theory of 
computing 

Full text available: 1p| pdf(920.13 KB) Additional Information: full citation , references , citings , index terms 



35 Similarity Search: Effective nearest neighbor indexing with the euclidean metric 
Sang-Wook Kim, Charu C. Aggarwal, Philip S. Yu 

October 2001 Proceedings of the tenth international conference on Information and 
knowledge management 

Full text available: ^ pdf(2.18 MB) Additional Information: full citation , abstract , references , index terms 

The nearest neighbor search is an important operation widely-used in multimedia databases. 
In higher dimensions, most of previous methods for nearest neighbor search become 
inefficient and require to compute nearest neighbor distances to a large fraction of points in 
the space. In this paper, we present a new approach for processing nearest neighbor search 
with the Euclidean metric, which searches over only a small subset of the original space. 
This approach effectively approximates clusters by ... 

Keywords: Euclidean metric, high dimensional indexes, multimedia databases, nearest 
neighbor queries, similarity search 
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36 Approximation algorithms for the mobile piercing set problem with applications to 
clustering in ad-hoc networks 
Hai Huang, Andrea W. Richa, Michael Segal 
April 2004 Mobile Networks and Applications, Volume 9 issue 2 

r- ..x ^ •■ ui 0 ^ OCC 7i/m Additional Information: full citation , abstract, references , index terms . 

Full text available: 753 pdfd 86.57 KB) — : 

ifii-*" review 

The main contributions of this paper are two-fold. First, we present a simple, general 
framework for obtaining efficient constant-factor approximation algorithms for the mobile 
piercing set (MPS) problem on unit-disks for standard metrics in fixed dimension vector 
spaces. More specifically, we provide low constant approximations for LI and Z_°o norms on a 
d-dimensional space, for any fixed d > 0, and for the L2 norm on tw ... 

Keywords: distributed algorithms, wireless networks 



37 Spatial Database Clustering: New methods for topological clustering and spatial 
access in object-oriented 3D databases 
M. Breunig, A. B. Cremers, W. Muller, J. Siebeck 

November 2001 Proceedings of the 9th ACM international symposium on Advances in 
geographic information systems 

Full text available: ^ pdfd. 71 MB) Additional Information: full citation , abstract , index terms 

The data handling component of today's geographical information systems still only 
considers the management of two-dimensional data. However, in the geosciences as well as 
in commercial planning fields there is an increasing need to manage large amounts of 2.5D- 
and 3D-data. On the one hand, mobile telephony providers require digital landform data to 
maintain overall communication service networks. On the other hand, geoscientists and 
engineers need 3D surface and solid data to research dynamic ... 

Keywords: 3D, GeoToolKit, geo-database, spatial access algorithms, spatial clustering 



38 Integrating constraints and metric learning in semi-supervised clustering | 
Mikhail Bilenko, Sugato Basu, Raymond J. Mooney 

July 2004 Proceedings of the twenty-first international conference on Machine 
learning ICML '04 

Full text available: ^ pdf(236.62 KB) Additional Information: full citation , abstract , references , citings 

Semi-supervised clustering employs a small amount of labeled data to aid unsupervised 
learning. Previous work in the area has utilized supervised data in one of two approaches: 1) 
constraint-based methods that guide the clustering algorithm towards a better grouping of 
the data, and 2) distance-function learning methods that adapt the underlying similarity 
metric used by the clustering algorithm. This paper provides new methods for the two 
approaches as well as presents a new semi-supervised clu ... 

39 Session 7B: On coresets for k-means and k-median clustering * | 
Sariel Har-Peled, Soham Mazumdar 

June 2004 Proceedings of the thirty-sixth annual ACM symposium on Theory of 
computing 

Full text available: 1^ D df(223.26 KB) Additional Information: full citation , abstract, references , citings, index 
lfiH ^ terms 

In this paper, we show the existence of small coresets for the problems of computing k- 
median and k-means clustering for points in low dimension. In other words, we show that 
given a point set P in R d , one can compute a weighted set S c P, of size 0(k e _d log n), such 

that one can compute the k-median/means clustering on S instead of on P, and get an 
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(l+e)-approximation. As a result, we improve the fastest known algorithms for 
approximate k-means ... 

Keywords: Coreset, clustering, k-means, k-median, streaming 



40 Industry/government track posters: Programming the K-means clustering algorithm in Q 
SQL 

Carlos Ordonez 

August 2004 Proceedings of the 2004 ACM SIGKDD international conference on 
Knowledge discovery and data mining 

Full text available: ^ pdfd 57.05 KB) Additional Information: full citation , abstract , references, index terms 

Using SQL has not been considered an efficient and feasible way to implement data mining 
algorithms. Although this is true for many data mining, machine learning and statistical 
algorithms, this work shows it is feasible to get an efficient SQL implementation of the well- 
known K-means clustering algorithm that can work on top of a relational DBMS. The article 
emphasizes both correctness and performance. From a correctness point of view the article 
explains how to compute Euclidean distance, near ... 

Keywords: K-means, SQL, clustering 
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41 The subresultant and clusters of close roots 
Tateaki Sasaki 

August 2003 Proceedings of the 2003 international symposium on Symbolic and 
algebraic computation 

Full text available: ^| pdf(274.18 KB) Additional Information: full citation , abstract , references , index terms 

This paper investigates the subresultant of univariate polynomials from the viewpoint of 
close roots. First, we derive formulas which express the subresultant and its cofactors in the 
root-differences. Then, we consider the case that the given polynomials contain one or more 
clusters of mutually close roots of closeness 6. We derive formulas showing the 
dependences of the coefficients of subresultant and its cofactors on B and the number of 
clusters. Finally, we determine the magnitude ... 

42 Indexing large metric spaces for similarity search queries 
Tolga Bozkaya, Meral Ozsoyoglu 

September 1999 ACM Transactions on Database Systems (TODS), Volume 24 issue 3 

Additional Information: full citation , abstract , references , citings , index 
terms, review 



Full text available: fjjj|pdff281.78 KB) 



One of the common queries in many database applications is finding approximate matches 
to a given query item from a collection of data items. For example, given an image 
database, one may want to retrieve all images that are similar to a given query image. 
Distance-based index structures are proposed for applications where the distance 
computations between objects of the data domain are expensive (such as high-dimensional 
data) and the distance function is metric. In this paper we consider ... 



43 Voronoi diagrams — a survey of a fundamental geometric data structure 
Franz Aurenhammer 

September 1991 ACM Computing Surveys (CSUR), Volume 23 issue 3 

Full text available: ^ pdf(5.18 MB) Additional Information: full citation , references , citings , index terms 



Keywords: cell complex, clustering, combinatorial complexity, convex hull, crystal 
structure, divide-and-conquer, geometric data structure, growth model, higher dimensional 
embedding, hyperplane arrangement, k-set, motion planning, neighbor searching, object 
modeling, plane-sweep, proximity, randomized insertion, spanning tree, triangulation 
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44 Efficient algorithms for geometric optimization | 
Pankaj K. Agarwal, Micha Sharir 

December 1998 ACM Computing Surveys (CSUR), Volume 30 issue 4 

c H ♦ i wi 0 jwc77 ta i/d\ Additional Information: full citation , abstract , references , citings , index 

Full text available: TH pdf(577.74 KB) - 

terms 

We review the recent progress in the design of efficient algorithms for various problems in 
geometric optimization. We present several techniques used to attack these problems, such 
as parametric searching, geometric alternatives to parametric searching, prune-and-search 
techniques for linear programming and related problems, and LP-type problems and their 
efficient solution. We then describe a wide range of applications of these and other 
techniques to numerous problems in geometric optim ... 

Keywords: clustering, collision detection, linear programming, matrix searching, parametric 
searching, proximity problems, prune-and-search, randomized algorithms 



45 Session 1 : Approximation algorithms for the mobile piercing set problem with | 
applications to clustering in ad-hoc networks 
Hai Huang, Andrea W. Richa, Michael Segal 

September 2002 Proceedings of the 6th international workshop on Discrete algorithms 

and methods for mobile computing and communications 

_ ii . , .. . . 0 4 a t/ D \ Additional Information: full citation , abstract , references , citings , index 

Full text available: TO pdf(273.11 KB) 

terms 

The main contributions of this paper are two-fold. First, we present a simple, general 
framework for obtaining efficient constant-factor approximation algorithms for the mobile 
piercing set (MPS) problem on unit-disks for standard metrics in fixed dimension vector 
spaces. More specifically, we provide low constant approximations for LI- and Loo-norms on a 
d-dimensional space, for any fixed d > 0, and for the L2-norm on 2- and 3-dimensional s ... 

Keywords: approximation algorithms, clustering, distributed protocols, mobile ad-hoc 
networks, piercing set 



46 The analysis of a simple k-means clustering algorithm Q 
Tapas Kanungo, David M. Mount, Nathan S. Netanyahu, Christine Piatko, Ruth Silverman, 
Angela Y. Wu 

May 2000 Proceedings of the sixteenth annual symposium on Computational 
geometry 

Full text available: ^ pdf(1.24 MB) Additional Information: full citation , references , citings , index terms 



47 Incremental clustering and dynamic information retrieval Q 
Moses Charikar, Chandra Chekuri, Tomas Feder, Rajeev Motwani 

May 1997 Proceedings of the twenty-ninth annual ACM symposium on Theory of 
computing 

Full text available: ^pdf(1.58 MB) Additional Information: full citation , references , citings , index terms 



48 Exact and approximation algorithms for clustering Q 
Pankaj K. Agarwal, Cecilia M. Procopiuc 

January 1998 Proceedings of the ninth annual ACM-SIAM symposium on Discrete 
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49 Routing and transport: Geometric spanner for routing in mobile networks Q 
Jie Gao, Leonidas J. Guibas, John Hershberger, Li Zhang, An Zhu 

October 2001 Proceedings of the 2nd ACM international symposium on Mobile ad hoc 
networking & computing 

Full text available- « odf(275.36 KB) Additional Information: full citation , abstract, references , citings, index 
i^p^-j terms 

We propose a new routing graph, the Restricted Delaunay Graph (RDG), for ad hoc 
networks. Combined with a node clustering algorithm RDG can be used as an underlying 
graph for geographic routing protocols. This graph has the following attractive properties: 
(1) it is a planar graph; (2) between any two nodes there exists a path in the RDG whose 
length, whether measured in terms of topological or Euclidean distance, is only a constant 
times the optimum length possible; and (3) the graph can be mai ... 

so Distributional Scaling: An Algorithm for Structure-Preserving Embedding of Metric and Q 

Nonmetric Spaces 
Michael Quist, Golan Yona 

December 2004 The Journal of Machine Learning Research, volume 5 

Full text available: ^ pdf(508.39 KB) Additional Information: full citation , abstract , index terms 

We present a novel approach for embedding general metric and nonmetric spaces into low- 
dimensional Euclidean spaces. As opposed to traditional multidimensional scaling 
techniques, which minimize the distortion of pairwise distances, our embedding algorithm 
seeks a low-dimensional representation of the data that preserves the structure (geometry) 
of the original data. The algorithm uses a hybrid criterion function that combines the 
pairwise distortion with what we call the geometric distortion. T ... 

51 Similarity querying II: QCIuster: relevance feedback using adaptive clustering for Q 
content-based image retrieval 
Deok-Hwan Kim, Chin-Wan Chung 

June 2003 Proceedings of the 2003 ACM SIGMOD international conference on 
Management of data 

Full text available- «Ddff2.1SMB) Additional Information: full citation , abstract, references , citings, index 

terms 

The learning-enhanced relevance feedback has been one of the most active research areas 
in content-based image retrieval in recent years. However, few methods using the relevance 
feedback are currently available to process relatively complex queries on large image 
databases. In the case of complex image queries, the feature space and the distance 
function of the user's perception are usually different from those of the system. This 
difference leads to the representation of a query with multiple ... 

Keywords: classification, cluster-merging, content-based image retrieval, image database, 
relevance feedback 



52 Clustering hypertext with applications to web searching 
Dharmendra S. Modha, W. Scott Spangler 

May 2000 Proceedings of the eleventh ACM on Hypertext and hypermedia 

Full text available: ^ pdf(300.31 KB) Additional Information: full citation , references , citings , index terms 
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Keywords: cluster annotation, feature combination, high-dimensional data, hyperlinks, 
sparse data, toric k-means algorithm, vector space model 



53 Clustering and singular value decomposition for approximate indexing in high 
dimensional spaces 

Alexander Thomasian, Vittorio Castelli, Chung-Sheng Li 

November 1998 Proceedings of the seventh international conference on Information 
and knowledge management 

Full text available: ^ pdfd.12 MB) Additional Information: full citation , references , citings , index terms 



54 KM-1 (knowledge management): clustering I: Using bi-modal alignment and clustering Q 
techniques for documents and speech thematic segmentations 

Dalila Mekhaldi, Denis Lalanne, Rolf Ingold 

November 2004 Proceedings of the thirteenth ACM conference on Information and 
knowledge management 

Full text available: ^ pdf(463.76 KB) Additional Information: full citation , abstract , references , index terms 

In this paper, we describe a new method for a simultaneous thematic segmentation of the 
meeting dialogs and the documents discussed or visible throughout the meeting. This bi- 
modal method is suitable for multimodal applications that are centered on documents, such 
as meetings and lectures, where documents can be aligned with meeting dialogs. Bringing 
into play this alignment, our bi-modal segmentation method first transforms its results into 
a set of nodes in a 2D graph space, where the two a ... 

Keywords: k-means clustering, thematic alignment, thematic segmentation 

55 Research sessions: clustering: Computing Clusters of Correlation Connected objects Q 
Christian Bohm, Karin Kailing, Peer Kroger, Arthur Zimek 

June 2004 Proceedings of the 2004 ACM SIGMOD international conference on 
Management of data 

Full text available: ^ pdf(645.32 KB) Additional Information: full citation , abstract , references 

The detection of correlations between different features in a set of feature vectors is a very 
important data mining task because correlation indicates a dependency between the 
features or some association of cause and effect between them. This association can be 
arbitrarily complex, i.e. one or more features might be dependent from a combination of 
several other features. Well-known methods like the principal components analysis (PCA) 
can perfectly find correlations which are global, linear, no ... 

56 Paper session I: techniques: Multi-vector feature space based on pseudo-euclidean 
space and oblique basis for similarity searches of images 
Yasuo Yamane 

June 2004 Proceedings of the 1st international workshop on Computer vision meets 
databases CVDB '04 

Full text available: ^ pdf(746.64 KB) Additional Information: full citation , abstract , references , citings 

Investigators have tried to increase the precision of similarity searches of images by using 
distance functions that reflect the similarity of features. When the quadratic-form distance is 
used, however, dissimilar images can be judged to be similar. We therefore propose that the 
similarity of images be evaluated using a measure of distance in a multi-vector feature 
space based on pseudo-Euclidean space and an oblique basis (MVPO). In this space an 
image is represented by a set of vectors each o ... 
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57 Flow Field Clustering via Algebraic Multigrid Q 
M. Griebel, T. Preusser, M. Rumpf, M. A. Schweitzer, A. Telea 
October 2004 Proceedings of the conference on Visualization '04 

Full text available: ^ pdf(775.74 KB) Additional Information: full citation, abstract 

We present a novel multiscale approach for flow visualization. We define a local alignment 
tensor that encodes a measure for alignment to the direction of a given flow field. This 
tensor induces an anisotropic differential operator on the flow domain, which is discretized 
with a standard finite element technique. The entries of the corresponding stiffness matrix 
represent the anisotropically weighted couplings of adjacent nodes of the domain mesh. We 
use an algebraic multigrid algorithm to gener ... 

Keywords: algebraic multigrid, multiscale visualization, flow visualization 



58 Session 7B: Bypassing the embedding: algorithms for low dimensional metrics 
Kunal Talwar 

June 2004 Proceedings of the thirty-sixth annual ACM symposium on Theory of 
computing 

Full text available: ^ pdf(249.41 KB) Additional Information: full citation , abstract , references , index terms 

The doubling dimension of a metric is the smallest k such that any ball of radius 2r can be 
covered using 2 k balls of radius r. This concept for abstract metrics has been proposed as a 
natural analog to the dimension of a Euclidean space. If we could embed metrics with low 
doubling dimension into low dimensional Euclidean spaces, they would inherit several 
s algorithmic and structural properties of the Euclidean spaces. Unfortunately however, such a 
restriction on dimension does ... 

Keywords: PTAS, TSP, distance labels, doubling metrics, routing schemes 
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59 Multimedia and visualization (MV): Similarity between Euclidean and cosine angle 
distance for nearest neighbor queries 
Gang Qian, Shamik Sural, Yuelong Gu, Sakti Pramanik 

March 2004 Proceedings of the 2004 ACM symposium on Applied computing 

Full text available: ^ pdf(878.42 KB) Additional Information: full citation , abstract , references 

Understanding the relationship among different distance measures is helpful in choosing a 
proper one for a particular application. In this paper, we compare two commonly used 
distance measures in vector models, namely, Euclidean distance (EUD) and cosine angle 
distance (CAD), for nearest neighbor (NN) queries in high dimensional data spaces. Using 
theoretical analysis and experimental results, we show that the retrieval results based on 
EUD are similar to those based on CAD when dimension is hig ... 

Keywords: Content based image retrieval, Cosine angle distance, Euclidean distance, Inter- 
feature normalization, vector model 



60 Index-driven similarity search in metric spaces 
Gisli R. Hjaltason, Hanan Samet 

December 2003 ACM Transactions on Database Systems (TODS), volume 28 issue 4 
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Full text available: ^ pdf(650.64 KB) terms 

Similarity search is a very important operation in multimedia databases and other database 
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applications involving complex objects, and involves finding objects in a data set S similar to 
a query object q, based on some similarity measure. In this article, we focus on methods for 
similarity search that make the general assumption that similarity is represented with a 
distance metric d. Existing methods for handling similarity search in this setting typically fall 
into one of ... 

Keywords: Hiearchical metric data structures, distance-based indexing, nearest neighbor 
queries, range queries, ranking, similarity searching 
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